This report presents Analysis of the effect of Depth on Price of diamonds. It is generated automatically from 'ggplot2::diamonds' dataset.
It is available in html, Word and PDF formats, all compiles from the same R Markdown script shown below
Code chunk below is generate to write an .csv file that contains the subset of data (defined by 'SAMPLE_SIZE', 'RANDOM_SEED' parameters)` and produce the graph that visuzalizes this subset.
NB: This code is written in such way that it can be re-used inside a 'for' loop or within an interactive application, where 'SAMPLE_SIZE', 'RANDOM_SEED' parameters can be changed automatically (in loop) or by user (in app).
Note the following:
# Below is the code that is pasted from `auto_report_code1.R`
dt <- ggplot2::diamonds %>% setDT %>% .[order(get("price"))] %>% setcolorder("price"); dt
## price carat cut color clarity depth table x y z
## 1: 326 0.23 Ideal E SI2 61.5 55 3.95 3.98 2.43
## 2: 326 0.21 Premium E SI1 59.8 61 3.89 3.84 2.31
## 3: 327 0.23 Good E VS1 56.9 65 4.05 4.07 2.31
## 4: 334 0.29 Premium I VS2 62.4 58 4.20 4.23 2.63
## 5: 335 0.31 Good J SI2 63.3 58 4.34 4.35 2.75
## ---
## 53936: 18803 2.00 Very Good H SI1 62.8 57 7.95 8.00 5.01
## 53937: 18804 2.07 Ideal G SI2 62.5 55 8.20 8.13 5.11
## 53938: 18806 1.51 Ideal G IF 61.7 55 7.37 7.41 4.56
## 53939: 18818 2.00 Very Good G SI1 63.5 56 7.90 7.97 5.04
## 53940: 18823 2.29 Premium I VS2 60.8 60 8.50 8.47 5.16
# Constants - which we want to be able modify (either automatically - in loop, or manualy - using interactive App) ----
SAMPLE_SIZE = 150
RANDOM_SEED = 99; set.seed(RANDOM_SEED)
CLARITY = (dt$ clarity %>% unique %>% sort)[1] # "I1"
# Subset data ----
# dt1 <- dt[clarity==CLARITY] [sample(.N, SAMPLE_SIZE)] [order(price)];
# If data field (eg 'clarity") can be modified by user, it should be codes as shown below
dt1 <- dt[get("clarity")==CLARITY] [sample(.N, SAMPLE_SIZE)] [order(get("price"))];
dt1
## price carat cut color clarity depth table x y z
## 1: 452 0.43 Premium H I1 62.0 59.0 4.78 4.83 2.98
## 2: 468 0.32 Good D I1 64.0 54.0 4.36 4.33 2.78
## 3: 491 0.40 Good F I1 63.3 60.4 4.64 4.68 2.95
## 4: 511 0.39 Very Good E I1 62.8 57.0 4.61 4.66 2.91
## 5: 584 0.50 Fair F I1 69.8 55.0 4.89 4.80 3.38
## ---
## 146: 11548 3.00 Good E I1 64.2 65.0 9.08 8.96 5.79
## 147: 11594 2.72 Ideal H I1 59.6 55.0 9.17 9.13 5.45
## 148: 15223 4.01 Premium J I1 62.5 62.0 10.02 9.94 6.24
## 149: 15984 4.00 Very Good I I1 63.3 58.0 10.01 9.94 6.31
## 150: 18531 4.50 Fair J I1 65.8 58.0 10.23 10.16 6.72
strTitle <- sprintf("Effect of Depth on Price for Clarity '%s' (size=%02g, seed=%02g).csv", CLARITY, SAMPLE_SIZE, RANDOM_SEED)
# Change this to 'T' to start writinh on your disk
if(F) {
fwrite(dt1, strTitle)
}
# Two ways of plotting variables ----
g <- ggplot(dt1) + theme_bw() +
geom_point(aes_string(x="depth", y="price",col="color", size="carat", shape="cut")) +
labs(title = strTitle)
g
## Warning: Using shapes for an ordinal variable is not advised
# another way to call variables inside ggplot functions - using get() function!
g1 <- ggplot(dt1) + theme_bw() +
geom_line(aes(x=get("depth"), y=get("price"),col=get("color"))) +
geom_point(aes(x=get("depth"), y=get("price"),col=get("color"), size=get("carat"))) +
facet_grid(get("cut") ~ .) +
labs(title = strTitle)
g1
Instead of putting the R code in R Markdown, it can be put in separate file and called from there using source("auto_report_code1.R")
For html report, the above tables and graphs can be converted into interactive ones - with a single line of code!
Interactive table is the advanced data science tool that allows one to browse and extract data from complex and large datasets efficiently using interactivity. You can sort, filter table by values.
dt1 %>% DT::datatable(
rownames=F, filter="top",
extensions = 'Buttons',
options = list(dom = 'Blfrtip', buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
)
)
Interactive plot is the advanced data science tool that allows one to analyze and visualize complex and large datasets very efficiently using interactivity.
plotly::ggplotly(g)
## Warning: Using shapes for an ordinal variable is not advised
plotly::ggplotly(g1)